Step 5 (select patents related to your project work from the MAIN database)
Start to select patents related to your project work to create the database that you are going to use to make the analysis in the future steps: put these patents together but, please, register each search strategy and keyword that you are using. Then on April 22 (I will be in Pisa) we will verify together how you are working for this step, patents that you are selecting and their relevance, in order to understand if you are missing some relevant keyword or if you are making some mistakes, before to proceed with the analysis.
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages ----------------------------------------------------------------------------------------------------------------- tidyverse 1.3.1 --
√ ggplot2 3.3.5 √ purrr 0.3.4
√ tibble 3.1.6 √ dplyr 1.0.8
√ tidyr 1.2.0 √ stringr 1.4.0
√ readr 2.1.2 √ forcats 0.5.1
-- Conflicts -------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
data <- read_csv("../data/data_subset1.csv")
Rows: 216986 Columns: 11
-- Column specification ----------------------------------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (7): filename, ipc_classes, assignee, inventors, title, abstract, claims
dbl (1): docdb_family_id
date (3): filing_date, publication_date, priority_date
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
data %>% head(20)
tibble_filtrato <- data %>%
unite(col="united", claims, abstract, title, sep=" ") %>%
filter(str_detect(tolower(united), "cosmetic")
| str_detect(tolower(united),"beauty")
| str_detect(tolower(united), "body")
| str_detect(tolower(united), "skin")
| str_detect(tolower(united), "nail")
| str_detect(tolower(united), "molecule")
| str_detect(tolower(united), "particle")
| str_detect(tolower(united), "fingernail")
| str_detect(tolower(united), "cleansing")
| str_detect(tolower(united), "manicure")
| str_detect(tolower(united), "pharmacological")
| str_detect(tolower(united), "technolog")
| str_detect(tolower(united), "nano")
| str_detect(tolower(united), "varnish")
| str_detect(tolower(united), "polish")
| str_detect(tolower(united), "altering")
) %>%
select(-united)
tibble_filtrato %>% head(20)
inner_join(data, tibble_filtrato, suffix = c(".x", ".y")) %>%
write_csv("../data/subest1_filtrato.csv")
Joining, by = "filename"